---
title: "2017 台電「能源永續」黑客松"
output:
flexdashboard::flex_dashboard:
source_code: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE
)
library(flexdashboard)
library(data.table)
library(dplyr)
library(ggplot2)
library(plotly)
library(highcharter)
library(DT)
# 讀入分群完的原始資料
tp_cluster <- read.csv("/Users/yangpeiwen/Documents/政大/台電比賽/taipei456_cluster .csv",fileEncoding = "utf-8")
power <- tp_cluster %>% dplyr::select(1,戶均用電,分群)
# 雷達圖資料
cluster_rader<-read.csv("/Users/yangpeiwen/Documents/政大/台電比賽/radar_plot.csv",fileEncoding = 'utf8')
```
Motivation
=====================================
Electricity Shortage is knocking our doors
In Taiwan, insufficient electricity supply is always an important issue. Especially in summer, we are usually under the risk of power outage. Sometimes, it really happened[(8/17 Power Outage in Taiwan)](https://udn.com/news/story/11419/2644282)In both Trends of Demand and Stability imply electricity shortage will be an even serious problem.
Electricity demand is ascending
In the past 4 years, the demand of electricity in Taiwan has increased about 8%. Furthermore, our government intends to shut down all the nuclear power plants by 2025. In fact, nuclear power plants account for 14% of total electricity supply.
Electricity supply becomes unstable
Operating reserve rate keeps decreasing and it hasn’t achieved the goal set by Taipower corp. 4 years in a row. Low operating reserve rate implies any unexpected power plant shut down Taiwan will suffer from power outage.
Current policies are paved with good intentions but inefficient
Although the government enforce a bunch of different policies to solve this problem, it isn’t effective at all. For example [Save the electricity on your own](http://energy-smartcity.energypark.org.tw/), [Countrywide Electricity saving completion.](http://energy-2016summer.energypark.org.tw/) However, governors didn’t clarify the reasons of wasting electricity for different regions. Without comprehending the actual reasons of wasting electricity, how can they come up with an appropriate policy to curb the electricity waste.
Our goal
The Electricity saving policies enforcing process can be broken down into 3 parts, Identify the regions wasting electricity, figure out the reasons they wasting, and Set up corresponding policies. Our product was designed to shorten the time consumed during this process and help governors apply the right policies on the right regions.
How we achieve our goal? U-Optimizer
To customize the policies by region is the key of achieving our goal. We utilized a lot of open data, Including demographics data, economic data, electricity usage data ….etc. Besides, we use both supervised and unsupervised machine learning methodologies to cluster the villages around Taiwan. In the end, we want to deliver a system which can assist governors to arrange the current policies to appropriate villages or set up a whole new policy for specific villages.
Data {#data-describe data-navmenu="Analysis"}
=====================================
Taiwan Open data
- [Taiwan Power Company Open Data](http://www.taipower.com.tw/content/announcement/ann01.aspx?BType=31)
- [Taiwan Educational Level](http://data.gov.tw/node/8409)
- [Income Tax Data](http://data.gov.tw/node/17983)
- [Population Data](http://data.moi.gov.tw/MoiOD/Data/DataDetail.aspx?oid=F4478CE5-7A72-4B14-B91A-F4701758328F)
- [Household Data](http://data.moi.gov.tw/MoiOD/Data/DataDetail.aspx?oid=F4478CE5-7A72-4B14-B91A-F4701758328F)
- [Ranks of Total Retail Sales](https://moeagis.carto.com/viz/b5e9f4e8-dc7c-11e6-8815-0ef24382571b/public_map)
- [Registered Business Sectors](http://ronnywang-twcompany.s3-website-ap-northeast-1.amazonaws.com/index.html)
- [Housing Login Prices](http://plvr.land.moi.gov.tw/DownloadOpenData)
- [Park Data](https://sheethub.com/data.taipei.gov.tw/%E8%87%BA%E5%8C%97%E5%B8%82%E9%84%B0%E9%87%8C%E5%85%AC%E5%9C%92%E9%BB%9E%E4%BD%8D%E8%B3%87%E6%96%99)
---------------
Detail Description
- 使用2016年7月、8月的非營業電力資料分析
- 使用2016年教育程度的資料,合理推估2016年7月、8月的教育程度狀況
- 所得稅資料之涵蓋範圍為2013年,假設2013年與2016年之人口結構相似進行推估
- 人口統計資料之最新資料為2015年,合理假設2015年與2016年之人口結構相似,因此使用2015年推估2016年
- 使用2016第三季之住宅統計資料
- 使用2016零售業銷售金額之村里排名
- 累積至2017年7月之營業商家登記數
- 使用2014年-2016年之實價登錄資料
- 累積至2015年7月之公園數量與坪數
Analysis Process {#analysis2 data-navmenu="Analysis"}
=====================================
Analysis Process
1. Cleane nine data set separately.
2. Merge data that cleaned together with 111 variables.
3. Use "Greedy Search" and "K-means" to filter out useful variables.
4. Calculate each region variationlate, and find the target
5. Use "Decision tree " to find the
Processed Data {#analysis1 data-navmenu="Analysis"}
=====================================
### Processed Data
```{r}
tp_cluster %>%
select(行政區域,人口數,戶均用電,青少年人口,壯年人口,老年人口,綜合所得總額,中位數,房價中位數,商家數,公園數,公園坪數,每戶平均人數,博士比例,碩士比例,大學比例,大學以下比例,平均屋齡,零售排名, 每戶平均老年人口數.人.,有偶比例...) -> origin
colnames(origin) = c("Region", "Population", "Electricity/household", "teenager", "Prime age", "elderly", "Total income", "Med_Income","Med_housing_price", "Business Sectors", "park", "Park pings", "population/household", "Ph.D.%","Master%", "College%", "Under college%", "House age", "retail rank", "Older_population/household", "Marriage_rate")
origin %>%
DT::datatable(options = list(pageLength = 30)) %>%
formatRound(c(2,13:18,20,21),digits = 2)
```
Radar Data {#analysis3 data-navmenu="Analysis"}
=====================================
Column {data-width=620}
-------------------------------------
### Radar Data
```{r}
tp_cluster %>% mutate(用電 = log(戶均用電),
扶養比 = (青少年人口+老年人口)/壯年人口,
所得中位log = log10(中位數),
老年比例 = 每戶平均老年人口數.人./每戶平均人數,
單身率 = 1 - 有偶比例...) %>%
select(行政區域, 分群, X1人一宅宅數比例... ,X6人以上一宅宅數比例..., 房價中位數,大學以下比例, 扶養比,老年比例 ,單身率, 戶均用電) -> radar_data
names(radar_data) = c("Region","Cluster", "1_live_rate", "6_live_rate", "Med_housing_price", "Under_college%", "Dependency_ratio","Eldery_one", "Single_rate", "Electricity")
radar_data$Cluster[is.na(radar_data$Cluster)] <- "第一群"
radar_data %>%
DT::datatable(
options = list(pageLength = 30)) %>% formatRound(3:9,digits = 3)
```
Column {data-width=380}
-------------------------------------
### Index Variables
1. 1_live_rate
一人一宅戶數比例
2. 6_live_rate
六人一宅戶數比例
3. Med_housing_price
房價中位數:該里的實價登錄交易坪價中位數
4. Under_college%
大學以下比例
5. Dependency_ratio
扶養比:(少年人口+老年人口)/壯年人口
- 少年人口:0~14歲的人口
- 壯年人口:15~64歲的人口
- 老年人口:65以上的人口
6. Eldery_one
每戶老人人數比例:每戶平均老人人數/每戶平均人數
7. Single_rate
單身比例:1-有偶比
8. Electricity
用電量:取log
Cluster{#cluster}
=====================================
Column {.tabset .tabset-fade}
-------------------------------------
### Multi Resident Housing
```{r}
## color
#col.raw <- c("#1d3156","#ff9c63","#7dbfc6","#00b1c9","#ea8ca7","#ffd2a0")
#col.raw <- c("#1d3156","#66c2a5","#ffd92f","#fc8d62","#e78ac3","#8da0cb")
col.raw <- c("#1d3156","#984ea3","#4daf4a","#ff7f00","#e41a1c","#377eb8")
## cluster 1 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 1 : Multi Resident Housing ") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1 ",
data = cluster_rader$第一群,
pointPlacement = 'on',color=col.raw[2]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### Disadventaged Family
```{r}
## cluster 2 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 2: Disadventaged Family") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 2 ",
data = cluster_rader$第二群,
pointPlacement = 'on',color=col.raw[3]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### General Family
```{r}
## cluster 3 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 3 : General Family") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 3 ",
data = cluster_rader$第三群,
pointPlacement = 'on',color=col.raw[4]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### Three Generations Family
```{r}
## cluster 4 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 4 : Three Generations Family") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 4 ",
data = cluster_rader$第四群,
pointPlacement = 'on',color=col.raw[5]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### Single Group
```{r}
## cluster 5 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 5 : Single Group") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1 ",
data = cluster_rader$第五群,
pointPlacement = 'on',color=col.raw[6]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### Summary
```{r}
power %>%
group_by(分群) %>%
summarise(n = n(), mean=mean(戶均用電),med=median(戶均用電) ) %>%
arrange(分群) %>%
datatable(options = list(pageLength = 5) ,colnames=c("Cluster", "Count", "Mean", "Median"))
```
Column {data-width=350}
-------------------------------------
### Outcome
Cluster1 : Multi Resident Housing
很明顯的教育程度是大學以下比例跟六人一宅戶數比例相當高,像是有名的[一戶百口人的洲美里](http://www.chinatimes.com/newspapers/20150527000550-260107)就在這一群中,這群的戶均用電最低,可能是一戶多人所以拉低了戶均用量,偏向outlier的型態
Cluster2 : Disadventaged Family
其實大部分指標跟總中位數很接近,但可以發現在教育程度是大學以下比例偏高,而房價中位數偏低,所以我們認為是狀況偏向些為弱勢家庭,位置也是在台北市中心偏外圍的地區。
Cluster3 :General Family
各項指標皆與總體中位數差不多的,故命名為大多數的一般家庭,以台北市而言應該是像小家庭的結構
Cluster4 :Three Generations Family
教育程度是大學以下比例最低,老年人口比例偏高,而扶養比最高,可猜測也有小孩,房價中位數也是最高,因此可能是偏市中心的小康家庭,這是用電量是屬於偏高的族群
Cluster5 :Single Group
這一群的單身率最高,由較高的老人人口比例可以看出,可能有跟父母同住,而用電量是屬於偏高的族群
Five Groups {#comparison_T data-navmenu="Group Comparison"}
=====================================
Electricity use of per household in five groups
```{r}
names(power) = c("Region","Electricity_household", "Cluster")
p <- plot_ly(power, y = ~Electricity_household,
alpha = 0.1, boxpoints = "suspectedoutliers" )
p %>% add_boxplot(x = ~Cluster)
```
Cluster4 vs Cluster5 {#comparison1 data-navmenu="Group Comparison"}
=====================================
Column {data-width=650}
-----------------------------------------------------------------------
### Cluster 4: Three Generations Family vs Cluster 5: Single Group
```{r}
## cluster 4 v.s cluster 5
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 4 vs Cluster 5") %>%
hc_subtitle(text = "Single Office Worker vs Winner at the Game of Life",
style=list( color = "#b10026", fontWeight = "bold")) %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 4 - Three Generations Family",
data = cluster_rader$第四群,
pointPlacement = 'on',color=col.raw[5]),
list(
name = " cluster 5 - Single Group",
data = cluster_rader$第五群,
pointPlacement = 'on',color=col.raw[6])
)
```
Column {data-width=350}
-----------------------------------------------------------------------
### Description
HIGH Electricity Utilization
Three Generations Family vs Single Group
-----
Both Cluster4 and Cluster5 are high electricity utilization regions. But we can tell the different from them.
Cluster 1 vs Cluster2 {#comparison2 data-navmenu="Group Comparison"}
=====================================
Column {data-width=650}
-----------------------------------------------------------------------
### Cluster 1:Multi Resident Housing vs Cluster 2: Disadventaged Family
```{r}
## cluster 3 v.s cluster 5
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 1 vs Cluster 2") %>%
hc_subtitle(text = "Multi Resident Housing vs Disadventaged Family",
style=list( color = "#b10026", fontWeight = "bold")) %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1 - Multi Resident Housing",
data = cluster_rader$第一群,
pointPlacement = 'on',color=col.raw[2]),
list(
name = " cluster 2 - Disadventaged Family",
data = cluster_rader$第二群,
pointPlacement = 'on',color=col.raw[3])
)
```
Column {data-width=350}
-----------------------------------------------------------------------
### Description
LOW Electricity Utilization
Multi Resident Housing vs Disadventaged Family
-----
Group DINKY use more electricity than group Multi-child Family
Radar chart {#comparison3 data-navmenu="Group Comparison"}
=====================================
### All radar chart
```{r}
## 推疊雷達圖
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1-Multi Resident Housing",
data = cluster_rader$第一群,
pointPlacement = 'on',color=col.raw[2]),
list(
name = "cluster 2-Disadventaged Family",
data = cluster_rader$第二群,
pointPlacement = 'on',color=col.raw[3]),
list(
name = "cluster 3-General Family",
data = cluster_rader$第三群,
pointPlacement = 'on',color=col.raw[4]),
list(
name = "cluster 4-Three Generations Family",
data = cluster_rader$第四群,
pointPlacement = 'on',color=col.raw[5]),
list(
name = "cluster 5-Single Group",
data = cluster_rader$第五群,
pointPlacement = 'on',color=col.raw[6]),
list(
name = "Total median",
data = cluster_rader$med,
pointPlacement = 'on',color= col.raw[1])
)
```
ClusterMap{#clustermap}
=====================================
Column {data-width=600}
-----------------------------------------------------------------------
### Map
Column {.tabset .tabset-fade data-width=400}
-------------------------------------
### Cluster1
```{r}
power %>% filter(Cluster == "Cluster1") %>%
ggplot( aes(x=Electricity_household)) +
geom_histogram(binwidth=40, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Multi Resident Housing ", x = "Electricity utilization")#+
# geom_vline(data=cdat, aes(xintercept=rating.mean), linetype="dashed", size=1, colour="red")
```
### Cluster2
```{r}
power %>% filter(Cluster == "Cluster2") %>%
ggplot( aes(x=Electricity_household)) +
geom_histogram(binwidth=25, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Disadventaged Family ", x = "Electricity utilization")
```
### Cluster3
```{r}
power %>% filter(Cluster == "Cluster3") %>%
ggplot( aes(x=Electricity_household)) +
geom_histogram(binwidth=25, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of General Family ", x = "Electricity utilization")
```
### Cluster4
```{r}
power %>% filter(Cluster == "Cluster4") %>%
ggplot( aes(x=Electricity_household)) +
geom_histogram(binwidth=30, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Three generations Family ", x = "Electricity utilization")
```
### Cluster5
```{r}
power %>% filter(Cluster == "Cluster5") %>%
ggplot( aes(x=Electricity_household)) +
geom_histogram(binwidth=30, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Single Group", x = "Electricity utilization")
```
Waste of electricity{#waste}
=====================================
Column {data-width=650}
-----------------------------------------------------------------------
### Map
Column {data-width=350}
-----------------------------------------------------------------------
### Outcome
The characteristics of the people who waste electricity
------
Disadventaged Family
- Nuclear family with higher income.
General Family
- Older people living in rich area.
Three Generations Family
- Extended family with higher income.
Single Group
- Living in an old house with higher education level ’s people
Conclusion {#conclusion}
============================
**Efficiency Evaluation**
若浪費電的人可以達到自己群的平均,可節省台北8%的電力!
Sidebar {.sidebar}
============================
Hi everyone, We are **Life is struggle.**
__U-Optimizer__
Government Transparency
政府端
我們希望可以幫助政府制定有效率的節電策略,因此需要找到到底誰在浪費電,我們透過分群的方式找到用電行為相似的族群,再各群中找出用電量異常高於平均的里,最後利用決策樹找出浪費電的里有什麼樣的特徵。
使用者端
使用者可以使用我們的產品來督促自己是否為浪費電的人
------
__Members:__
1. [Peng-Wen,Lin (林芃彣 Nicole)-
NCCU Department of Statistics](https://www.facebook.com/profile.php?id=100000344369057)
2. [Pei Wen,Yang (楊佩雯 Penny)-
NCCU Department of Statistics](https://www.facebook.com/yangpenny0903?fref=ufi)
3. [Pei Shuan - Haung (黃培軒 Bacon)-
NCCU Department of Statistics](https://www.facebook.com/profile.php?id=100004119858241)
4. [Li-Jer, Lin (林立哲))-
CITIC Housing](https://www.facebook.com/sweetcow)